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Imagine  a  warfighter  analyzing  the  battlespace.  He  implements  a  system  that  classifies 
remotely  sensed  objects  according  to  a  set  of  200  different  possible  labels.  Unknown  to 
the  warfighter  or  the  command  element,  numerous  enemy  forces  have  infiltrated  the  bat¬ 
tlespace,  and  so  the  classification  system  reports  a  large  group  of  enemy  tanks  in  exactly 
the  same  spot  where  friendly  forces  were  previously  stationed— i.e.,  targets  that  must  be 
destroyed  immediately.  The  warfighter  has  never  seen  such  a  large  concentration  of  enemy 
forces  in  this  particular  battlespace  and  begins  to  question  the  results,  failing  to  immedi¬ 
ately  order  the  destruction  of  those  objects.  Time  ticks  by,  and  it  becomes  apparent  to  the 
command  element  and  the  warfighter  that  the  objects  are  dangerous  enemies  after  they  re¬ 
treat  beyond  weapons  range,  and  the  window  of  opportunity  to  act  decisively  shrinks  away. 

If  briefed  beforehand  that  the  classification  system  could  minimize  risk  based  on  the  classi¬ 
fication  cost  and  battlespace  information  provided  by  the  command  element,  the  warfighter 
would  have  had  more  confidence  in  the  classification  system,  and  might  have  quickly  taken 
decisive  action. 

When  comparing  classification  systems  to  one  another  via  Receiver  Operating  Characteris¬ 
tic  (ROC)  analysis,  some  comparison  methods  do  not  consider  the  whole  picture— i.e.,  costs 
and  class  prevalences  along  with  the  class-conditional  probabilities.  Because  the  volume 
under  a  ROC  surface  in  a  200-class  case  would  be  a  39,800-dimensional  object,  concepts 
such  as  Volume  Under  the  Surface  (VUS)  become  rather  cumbersome.  Most  attempts  to 
generalize  geometric  concepts  to  the  general  n-class  case  choose  to  ignore  either  the  class 
prevalences  or  the  costs.  The  concept  of  risk  allows  a  much  more  robust  form  of  ROC  anal¬ 
ysis  to  take  place,  one  which  considers  many  more  of  the  characteristics  of  the  operating 
environment  in  which  the  receiver  of  information  resides. 

Recent  research  into  comparison  methods  for  classification  systems  explores  continuous 
fixed-support  joint  distributions  of  class  prevalences  as  weighting  functions  to  deal  with 
classification  domains  about  whose  class  prevalences  one  has  limited  or  no  knowledge.  Us¬ 
ing  empirical  statistical  methods,  a  warfighter  can  calculate  the  probability  and  expected 
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cost  or  loss  appropriate  to  each  type  of  classification  decision,  assuming  costs  to  be  sub¬ 
jectively  fixed,  and  that  acceptable  estimates  for  class-conditional  probabilities  exist.  As 
the  sum  of  the  products  of  cost  and  probability  for  all  types  of  classification  decisions, 
total  classification  risk  for  a  classification  system  is  easily  calculated.  Empirical  risk  data 
produced  by  statistical  simulation  of  the  battlespace  lends  itself  to  statistical  description 
of  total  classification  risk  for  comparison  with  other  classification  systems. 

An  example  of  a  joint  distribution  for  class  prevalences  over  a  standard  simplex  is  pro¬ 
vided  by  way  of  a  multivariate  triangular  distribution.  Families  of  classification  systems 
are  created  using  Probabilistic  Neural  Nets  (PNN)  acting  on  the  Moving  and  Stationary 
Target  Acquisition  and  Recognition  (MSTAR)  mixed  targets  data  set.  The  spread  parame¬ 
ter  of  the  PNNs  serves  as  one  threshold  distinguishing  the  PNN  classification  systems  from 
one  another,  and  a  second  parameter  is  a  cropping  proportion  used  in  processing  the  image 
data.  Using  computer  simulation,  a  warfighter  can  choose  a  threshold  that  minimizes  risk 
under  the  assumption  of  temporarily  fixed  costs. 


Nomenclature 


AER 

Actual  Error  Rate 

AUC 

Area  Under  the  ROC  Curve 

Pj 

Class  Prevalence 

cij 

Classification  Cost 

c 

Classification  Cost  Matrix  [C]y  =  cy 

7Tij(A) 

Classification  Probability 

n  (A) 

Classification  Probability  Matrix  [XI]  ij  =  7^ 

Rc{A) 

Classification  Risk 

A:  E— >L 

Classification  System 

Class-conditional  Probability 

Q(^) 

Conditional  Probability  Matrix  [Q(A)]y  =  q;|j(A) 

ERRT 

Empirical  ROC  Risk  Threshold 

(>)f 

Frobenius  Inner  Product 

© 

Hadamard  Product  Operator 

P 

Prevalence  Matrix:  each  row  identical  to  transposed  vector  p 

PNN 

Probabilistic  Neural  Net 

ROC 

Receiver  Operating  Characteristic 

RRF 

ROC  Risk  Functional 

9 

Threshold  Parameter 

0 

Threshold  Set 

vus 

Volume  Under  the  ROC  Surface 

Subscripts 

i 

Refers  to  label  l\  €  L 

j 

Refers  to  class  £j  C  E 

i|j 

Label  l\G  L  given  class  £j 

of  Class  Prevalences  {pj}jn_1 


Note:  Familiarity  with  mathematical  symbols  such  as  G,  U,  fl,  C,  => 


-  T 


and  V,  is  recommended. 


I.  Introduction 

The  process  of  classification  requires  an  algorithm  known  as  a  classification  system.  Given  a  sample 
space  E  of  possible  outcomes  or  events,  along  with  a  finite  set  L  =  {f'1,^2,^3,  •  ■  • ,£n}  of  distinct  labels,  a 
function  A :  E  — >  L  is  a  classification  system.  An  “n-class  system”  has  exactly  n  labels  (n  =  1,  2, 3  . . .);  for 
example:  an  Identification,  Friend  or  Foe  (IFF)  3-class  system  that  labels  objects  as  “friendly,”  “unfriendly,” 
or  “unrecognized.” 

A  classification  system  may  have  a  threshold  parameter  9,  selected  from  a  finite-dimensional  threshold  set 
0  of  parameters  that  may  influence  classification.  For  example,  when  classifying  adults  into  men  and  women 
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based  on  height  alone,  a  single-dimensional  threshold  parameter  8  might  be  a  point  between  the  median 
men’s  and  women’s  heights,  where  if  Height(adult)  >  9 ,  we  label  the  adult  as  “man”;  otherwise,  “woman”. 
If  classification  entails  using  a  measuring  tape,  then  a  two-dimensional  threshold  set  0  =  ©i  x  02,  might 
contain  ordered  pairs  8  =  ( di,02 ),  where  9-\  is  the  height  chosen  above  and  d2  the  tension  on  the  tape. 

When  facing  a  decision  of  where  to  set  threshold  parameters,  Bayesian  decision  theory  suggests  applying 
the  concept  of  risk.2,5  To  calculate  the  risk  Rc(A$)  of  a  classification  system  Ag  (that  is,  a  general 
classification  system  A,  specified  by  a  particular  choice  of  8  from  the  theshold  set),  we  must  know  each 
possible  outcome  and  its  associated  cost  and  probability.  We  rely  on  the  formula  for  classification  risk  R 
(suppressing  notational  dependence  on  8),  given  by3 

Rc(A)  =  (U(A),C)f  (1) 

where  11(d)  is  a  matrix  of  classification  probabilities,  C  is  a  matrix  of  classification  costs  (assumed  tem¬ 
porarily  fixed),  and  (,)p  represents  the  action  of  the  Frobenius  inner  product,  given  by5 

<U’V>E  =  E(  Eu-iVi.i)  (2) 

i=i \  j=i  / 

where  [U];j  =  uy  and  [V]y  =  vy  are  any  two  matrices  of  the  same  size  s  x  r.  For  an  n-class  system,  with 
exactly  n2  types  of  possible  classification  decisions,  11(d)  and  C  are  both  n  x  n. 

We  now  define  11(d)  and  C  in  precise  mathematical  terms,  beginning  with  necessary  preliminaries. 

A.  Conditional  Probability  Matrix 

The  elements  [Q(d)]y  =  q;|j(d)  of  a  conditional  probability  matrix  are  class-conditional  probabilities.  Point 
estimates  of  class-conditional  probabilities  for  a  classification  system  are  calculated  by  means  of  a  confusion 
matrix  (the  most  accurate  and  reliable  being  one  produced  using  Lachenbruch’s  holdout  procedure1,4).  To 
illustrate,  consider  a  2  x  2  contingency  matrix  of  results  from  a  classification  experiment  where  we  have 
explicit  knowledge  of  the  number  of  items  in  each  population.  A  matrix  such  as  that  shown  in  Table  1  is  a 
simple  tally  of  the  numbers  of  each  type  of  classification  decision,  both  correct  and  incorrect,  with  correct 
decisions  on  the  diagonal  and  columns  corresponding  to  truth.  Here,  class  1  is  positive,  and  class  2  negative ; 
hence,  the  true  positive  count  TP  is  how  many  items  of  class  1  were  correctly  labeled,  and  the  false  negative 
count  FN  is  how  many  were  not,  and  so  forth.3,6 


Table  1.  Two-Class  Contingency  Matrix. 


Contingency  Matrix 

Labeled  Class:  1 
Labeled  Class:  2 


Actual  Class:  1 


TP 

FN 


Actual  Class:  2 


FP 

TN 


From  this  matrix,  form  estimates  of  the  class-conditional  probabilities  by  dividing  each  element  in  a 
column  by  the  number  of  items  in  the  class  corresponding  to  truth  for  that  column;  with  Mi  and  M2  items 
from  Classes  1  and  2,  respectively,  estimates  of  class-conditional  probabilities  appear  in  Table  2. 


Table  2.  Two-Class  Confusion  Matrix. 


Confusion  Matrix 

Labeled  Class:  1 
Labeled  Class:  2 


Actual  Class:  1 
TP 

Mi 

FN 

Mi 


Actual  Class:  2 

FP 

M2 

TN 

M2 


The  result  is  a  transpose  stochastic  confusion  matrix,  such  that  the  sum  of  each  column  is  one;  therefore, 
the  information  contained  in  a  2x2  confusion  matrix  may  be  represented  by  an  ordered  pair  comprised 
of  one  entry  from  each  column,  which  may  then  be  plotted  on  a  unit  square.  As  the  convex  hull  of  plotted 
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points  from  many  such  classification  systems,  a  curve  known  as  a  Receiver  Operating  Characteristic  (ROC) 
curve  is  formed  and  may  be  analyzed  through  measures  such  as  the  Area  Under  the  ROC  Curve  (AUC),  or 
Volume  Under  the  ROC  Surface  (VUS)  for  classification  systems  with  more  than  two  classes;  however,  such 
traditional  methods  of  ROC  analysis  do  not  inherently  facilitate  the  calculation  of  classification  risk.6 

Given  a  classification  system  A:  E  — >  L,  along  with  a  probability  measure  P:  £  — >  [0, 1]  defined  on  a 
tr-field  £  over  E  containing  all  pre-images  A11  [{!)}]  C  E  of  singleton  label  subsets  {i(\  C  L  (where  the 
Becuadro,  t],  denotes  the  set  function  :  L  — »  E  with  pre-images  of  A  in  the  codomain)  and  all  classes  in 

n 

the  partition  |^J(£j)  =  E  induced  by  L  on  E,  the  class-conditional  probability  qyj(A)  is  the  conditional 
j  =  ! 

probability  that  A(e)  =  i i ,  given  that  e  €  £j,  and  is  given  by5 


Oi|j(-4) 


P 

P 

P 


6  G  £i 


(A(e)=e i  |  ee  £j) 

(ee  A* [{£>}} 

(^[{4}]  n£ 

W j) 


i,j  =  1,2,3,. 


• , n 


(3) 


when  class  £j  has  P(£ j)  0.  For  a  class  £j  with  prior  probability  P(£j)  =  0,  all  class-conditional  prob¬ 
abilities  conditioned  on  £j  are  given  by  qyj(A)  =  0,  V  i  =  1,2,3,  ...,n.  A  class-conditional  probability 
may  take  on  any  value  in  [0,1];  thus,  for  each  i  and  j,  qyj(A)  is  a  well-defined  probability  measure  with 

n 

qiLj(A)  =  1,  V  j  =  l,  ...  ,n  (i.e.,  each  column  of  Q  sums  to  one). 

i=  1 


B.  Prevalence  Matrix 


The  elements  [P]y  =  Pj  of  a  prevalence  matrix  are  class  prevalences.  If  we  multiply  (3)  above  by  the 
probability  pj  =  P(£j)  of  being  in  class  £j,  or  the  class  prevalence  of  £j ,  the  probability  of  the  classification 
system  labeling  an  outcome  e  £  £j  with  l\  immediately  results;  therefore,  to  make  the  calculation  of  such 
probabilities  as  simple  as  possible,  the  prevalence  matrix  P  for  an  n-class  system  is  given  by6 


T 

p 

pi  ■ 

•  •  Pn 

T 

L  p 

nxn 

.  Pl  • 

••  Pn  _ 

(4) 


where  pT  is  the  transposed  vector  of  class  prevalences  { p j } n  x  to  which  each  row  of  P  is  identical.  Note 

n  n 

that  since  [J(£j)  is  a  partition  of  E,  Pj  =  1.  In  other  words,  each  row  of  the  stochastic  matrix  P 
j=i  j  =  i 

sums  to  one  and  is  identical  to  all  other  rows.  It  is  therefore  not  necessary  to  label  an  element  of  P  with 
two  subscripts  as  usual,  so  we  subscript  elements  of  P  according  to  the  column  in  which  they  reside. 


C.  Classification  Probability  Matrix 

The  elements  [  11(A)  ]y  =  7Tij(A)  of  a  classification  probability  matrix  are  classification  probabilities  given 
by  Try  (A)  ee  qiLj(A)pj  =  P  f|  £j^j ;  in  other  words,  the  probability  that  A(e  €  £j)  =  l\.  Since  all 

possible  outcomes  are  accounted  for,  the  elements  of  11(A)  sum  to  one. 

The  Hadamarcl  product  operator  is  given  by 

U©V  ..  =  uijVij  (5) 

where  [U]y  =  Uy  and  [V]y  =  vy  are  any  two  matrices  of  the  same  size.  A  classification  probability  matrix  is 
therefore  given  by  11(A)  =  Q(A)©P,  and  so  (1)  becomes  i?c  (A)  =  (11(A),  C)F  =  ^  [Q(A)  ©  P]  ,  C^>  . 
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D.  Classification  Cost  Matrix 

The  elements  [C]y  =  cy  of  a  classification  cost  matrix  are  numerical  classification  costs  associated  with 
the  classification  decisions  whose  probabilities  appear  in  the  corresponding  positions  of  n(A).  A  general 
rule  of  thumb  is  that  correct  decisions  have  a  cost  of  zero  (i.e.,  the  diagonal  elements  of  C  are  all  zero)  and 
incorrect  decisions  have  a  positive  cost  value;  however,  the  framework  presented  above  does  not  require  any 
such  restrictions  on  the  methodology  used  to  apply  the  cost  concept. 

Costs  are  assumed  to  be  authoritative  and  based  on  a  subjective  assessment  of  the  inherent  losses  (ac¬ 
cording  to  some  relative  numerical  scale)  associated  with  making  each  possible  kind  of  classification  decision. 
For  example,  one  may  assume  there  is  no  cost  associated  with  classifying  a  woman  as  a  woman  or  a  man  as 
a  man,  but  the  cost  associated  with  classifying  a  woman  as  a  man  may  not  be  exactly  equal  with  that  of 
classifying  a  man  as  a  woman.  Costs  may  change,  but  as  the  framework  we  present  allows  for  near-realtime 
re-calculation  of  classification  risk  whenever  cost  structure  changes,  we  assume  them  temporarily  fixed. 

To  calculate  classification  risk,  simply  add  all  products  of  cost  and  probability  together,  as  in  (1)  above. 

II.  Method 

The  calculation  of  classification  risk  is  very  simple  if  all  quantities  involved  are  constants;  however, 
depending  on  the  environment  in  which  classification  occurs,  there  may  be  significant  variability  in  the 
classification  probabilities.  Note  that  elements  of  the  matrices  C,  P,  Q(A),  and  11(A)  are  actually  random 
variables  over  the  sample  space  E;  however,  for  the  purposes  of  this  paper,  we  assume  costs  Cj|j  to  be  constant 
random  variables.  The  variables  that  may  have  the  greatest  effect  on  the  classification  process  are  the  class 
prevalences,  since  they  are  part  of  the  definition  of  the  class-conditional  probabilities  and  there  is  statistical 
dependence  between  pj  and  q^j  in  many  cases.6  In  addition,  the  class  prevalences  are  a  function  of  the 
environment  in  which  classification  occurs,  and  if  this  is  the  physical  world,  such  variables  may  tend  to  be 
extremely  unpredictable.  However,  limited  knowledge  based  on  expert  opinion  is  better  than  no  knowledge 
at  all. 

Recent  work  on  the  subject  of  calculating  classification  risk  in  an  uncertain  classification  environment 
illustrates  the  framework  for  constructing  a  joint  distribution  of  class  prevalences,  with  the  restriction  that 
no  class  may  have  zero  population  density.  The  work  employs  classical  statistical  methods  to  select  a 
classification  system  based  on  a  point  estimate  of  classification  risk  via  the  ROC  Risk  Functional  (RRF)5, 6 

"gr«Ei:e{EiRc('4»)i}  =  "gr€i:e{E[([Q("li,)0Pl'c)F. 

where  a  family  Aq  =  { Ag  :  9  £  0}  of  classification  systems  is  defined  over  a  threshold  set  O  of  parameters. 
The  RRF  relies  heavily  on  assumptions  of  statistical  independence  to  allow  a  quick  calculation,  but  these 
assumptions  do  not  hold  up  to  scrutiny.6  Therefore,  we  propose  the  Empirical  ROC  Risk  Threshold  (ERRT) 

r  =  argmin  {e  [f?c(^0]  +  D  [RC(Ae)]  }  «  argmin  S^E  (  n(I^)  ,  c)^  +  D  (  UiAg)  ,  j  (7) 

where  n(Ag)  is  an  acceptable  estimate  of  II(Ae),  and  where  E  [f?c(Ag)]  and  D  [i?c(Ae)]  are  robust 
measures  of  central  tendency  and  dispersion  for  risk,  respectively.  Stated  simply,  we  choose  any  threshold 
parameter  9*  such  that  the  quantity  E  [f?c(^W)]  +  D  [i?c(^4e*)]  is  a  minimum  over  all  6  £  0. 

With  no  assumptions  regarding  the  nature  of  statistical  distributions  for  the  variables  involved  (except  for 
costs,  as  mentioned  above),  we  use  statistical  simulation  to  produce  and  compare  values  of  the  comparative 
risk  quantity  E  [f?c(^e)]  +  D  [iJc(Ag)]  for  as  many  choices  of  the  parameter  9  as  suit  our  needs  (i.e.,  we 
define  the  threshold  set  0  to  be  of  convenient  size).  The  ERRT  has  the  advantage  of  considering  dispersion, 
or  variability,  in  addition  to  a  measure  of  central  tendency  for  the  classification  risk,  unlike  the  RRF.  This 
allows  a  given  threshold  parameter  with  a  slightly  higher  measure  of  central  tendency  for  risk  to  still  compete 
against  other  threshold  parameters  if  its  statistical  dispersion  of  risk  is  smaller.  Threshold  parameters  with 
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extremely  high  statistical  dispersion  of  risk  are  essentially  eliminated  from  the  competition,  which  tends  to 
result  in  the  selection  of  a  system  with  smaller  statistical  dispersion. 

To  illustrate,  we  classify  the  8-class  MSTAR  Mixed  Targets  data  set,  using  between  six  and  seven 
thousand  total  data  points,  varying  by  image  cropping  factor.  We  use  a  Probabilistic  Neural  Net  (PNN) 
classifier  trained  on  standardized  data  containing  the  first  two  principal  components  from  a  set  of  processed 
data.5  Processed  data  for  an  MSTAR  image  include  the  eccentricity  of  an  ellipse  fit  to  the  convex  hull 
of  edges  detected  using  Matlab®  in  a  mathematically  transformed,  cropped,  noise-reduced  version  of  the 
image  (images  with  no  suitable  edges  are  excluded).  Each  data  point  also  contains  numerical  values  from 
the  binary  file  header  characterizing  the  target  image,  and  which  were  gathered  simultaneously  with  the 
synthetic  aperture  radar  image  pixels,  such  as  bandwidth  and  dynamic  range.  The  first  principal  component 
of  the  processed  data  set,  heavily  loaded  against  the  “X_Velocity”  data  from  the  file  header,  accounts  for 
99.98%  of  overall  variability,  and  the  second  is  loaded  against  eccentricity  as  described  above.  The  data  are 
standardized  for  use  with  the  PNN  classifier,  which  employs  a  common  “spread”  parameter  as  the  standard 
deviation  of  a  multivariate  normal  marginal  distribution  about  each  training  data  point,  as  illustrated  in 
Figure  1  for  a  two-dimensional  data  set.  Standardization  ensures  experimentation  with  spreads  greater  than 
one  will  not  produce  different  results  than  a  spread  of  one  when  implementing  the  PNN  classifier. 


Three  Two-Dimensional  Gaussians,  <7=0.5 


Three  Two-Dimensional  Gaussians,  <r=1.0 


if 

1 

Q 

i? 

IS 

«6 
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Figure  1.  Effects  of  Differing  Spread  Parameters  for  a  PNN  classifier. 


For  the  purpose  of  this  illusration,  we  use  a  two-dimensional  threshold  set.  The  chosen  PNN  spread  is 
along  the  first  parameter  axis,  and  the  second  parameter  axis  is  a  proportion,  namely,  the  ratio  of  the  small 
side  of  the  remaining  image  to  the  original  square  image  side  length  after  cropping  appropriately  rotated 
images  with  a  golden  section  rectangle  during  the  first  step  in  data  processing.  Proportions  of  the  golden 
section  are  used  for  cropping  rectangles  due  to  the  rectangular  nature  of  the  image  targets  (mostly  tanks  or 
other  such  vehicles),  with  the  rectangles  placed  as  close  as  possible  to  the  exact  center  of  the  images. 
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Given  target  image  types  BTR-60,  2S1,  BRDM-2,  D7,  T62,  ZIL-131,  ZSU-23/4,  and  SLICY,  we  label 
these  classes  one  through  eight,  respectively,  and  simulate  class  prevalences  for  the  first  seven  by  means 
of  a  jointly  triangular  distribution  over  a  standard  7-simplex.6  The  triangular  marginal  distributions  for 
classes  1  through  7  each  have  support  on  [0,1],  with  modes  evenly  spaced  every  tenth  from  ^  to 
respectively.  The  prevalence  of  the  eight  class  (SLICY,  a  fabricated  control  target)  is  then  given  by  the 
difference  between  one  and  the  sum  of  the  first  seven  class  prevalences  previously  drawn  randomly  from  their 
joint  distribution.6  We  simulate  four  random  draws  of  all  eight  theoretical  class  prevalences  in  this  way  to 
account  for  variability  of  the  class  prevalences,  and  for  each  of  these  four  random  draws  we  further  simulate 
another  four  random  selections  of  individual  data  points  from  the  standardized  principal  component  data 
according  to  the  theoretical  prevalences  drawn,  to  account  for  variations  within  the  MSTAR  data.  Although 
all  classes  must  have  non-zero  population  density  for  a  classification  label  to  have  meaning,  a  classification 
system  operating  in  the  “real  world”  may  not  encounter  any  items  of  a  certain  class  over  a  finite  time  period. 
We  disallow  random  theoretical  class  prevalence  draws  that  round  to  an  actual  draw  of  zero  population  for 
any  class,  to  more  aptly  illustrate  application  of  the  risk-based  comparison  theory. 


III.  Results 

We  employ  three  different  classification  cost  matrices  to  illustrate  the  effect  of  cost  on  risk,  with  the 
median  and  the  median  absolute  deviation  from  the  median  as  measures  of  central  tendency  and  dispersion 
for  risk,  respectively.  A  standard  cost  matrix,  such  as  the  one  appearing  in  Table  3,  has  zeroes  on  the  diagonal 
and  ones  everywhere  else,  and  risk  calculations  with  this  matrix  yield  the  Actual  Error  Rate  (AER)  of  the 
classification  system  when  the  Lachenbruch  holdout  procedure  is  used  for  classifier  training  and  validation.1,4 


Table  3.  Standard  Cost  Matrix. 
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A  “low”  cost  matrix  chosen  to  illustrate  the  effect  of  cost  on  risk  appears  in  Table  4.  Figures  2  and  3 
show  that  the  same  ordered  threshold  parameter  pair  ( 1 ,  ]  66q  )  yields  minimal  comparative  risk  for  both  the 
standard  and  “low”  cost  matrices. 


Table  4.  Low  Cost  Matrix. 
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A  “high”  cost  matrix  chosen  to  illustrate  the  inherent  flexibility  of  the  definition  of  cost  appears  in  Table 
5.  Decision-makers  who  provide  guidance  on  cost  are  free  to  choose  numbers  in  any  way  that  suits  them.  For 
instance,  ones  along  the  diagonal  (for  correct  classification  decisions)  could  indicate  that  there  is  a  financial 
cost  incurred  due  to  the  act  of  classification,  regardless  of  the  outcome,  and  so  costs  for  incorrect  decisions 
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Cropping  Ratio  Spread 

Figure  2.  Comparative  Risk  Surface  Over  a  Two-Dimensional  Threshold  Set,  Standard  Costs. 


MSTAR  Data  Classification  Risks,  Low  Costs 
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Figure  3.  Comparative  Risk  Surface  Over  a  Two-Dimensional  Threshold  Set,  Low  Costs. 

might  be  scaled  according  to  “cost”  of  a  correct  decision.  Figure  4  illustrates  how  the  entire  surface  is 
generally  elevated  above  those  of  Figures  2  and  3  as  a  result. 

Note  that  for  the  “high”  cost  matrix,  an  ordered  threshold  parameter  pair  (|,  ^|A),  distinct  from  the 
pair  (|,  { ooo J  selected  when  using  either  the  standard  and  “low”  cost  matrices,  yields  minimal  comparative 
risk.  Although  the  example  may  be  extreme,  it  does  serve  to  illustrate  that  cost  can  be  an  important  factor 
in  risk-based  comparison  of  classification  systems.  However,  even  if  standard  cost  matrices  are  always  used, 
thereby  producing  only  the  AER  of  a  classification  system,  the  empirical  statistical  method  presented  still 
presents  a  sound  basis  for  decision-making. 
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Table  5.  High  Cost  Matrix. 
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Figure  4.  Comparative  Risk  Surface  Over  a  Two-Dimensional  Threshold  Set,  High  Costs. 


IV.  Conclusion 

The  result  of  the  theory  presented  is  that  risk  can  be  calculated  very  quickly  once  the  computationally 
intensive  statistical  simulation  of  the  classification  environment  (i.e.,  the  “battlespace” )  is  completed.  The 
assumption  that  costs  are  fixed  is  acceptable  because  calculation  of  the  Frobenius  inner  product  of  any  two 
matrices  (even  one  whose  dimension  is  200  x  200,  for  example)  can  be  performed  in  near-real  time  by  any 
computer  or  capable  pocket  calculator.  This  allows  end  users  of  a  classification  algorithm  to  have  confidence 
in  decisions  made  by  the  system,  even  (and  perhaps  especially)  when  those  decisions  are  surprising,  because 
all  possible  outcomes,  and  their  associated  costs  and  probabilities,  are  inherently  considered  and  accounted 
for  by  a  risk-based  approach. 
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